An Analysis of Generalization and Regularization in Nonlinear Learning Systems

نویسنده

  • John E Moody
چکیده

We present an analysis of how the generalization performance expected test set error relates to the expected training set error for nonlinear learn ing systems such as multilayer perceptrons and radial basis functions The principal result is the following relationship computed to second order between the expected test set and training set errors hEtest i hEtrain i eff peff n Here n is the size of the training sample eff is the e ective noise variance in the response variable s is a regularization or weight decay parameter and peff is the e ective number of parameters in the non linear model The expectations h i of training set and test set errors are taken over possible training sets and training and test sets respec tively The e ective number of parameters peff usually di ers from the true number of model parameters p for nonlinear or regularized models this theoretical conclusion is supported by Monte Carlo experiments In addition to the surprising result that peff p we propose an estimate of called the generalized prediction error GPE which generalizes well established estimates of prediction risk such as Akaike s FPE and AIC Mallows CP and Barron s PSE to the nonlinear setting GPE and peff were previously introduced in Moody Background and Motivation Many of the nonlinear learning systems of current interest for adaptive control adaptive signal processing and time series prediction are supervised learning sys tems of the regression type Understanding the relationship between generalization performance and training error and being able to estimate the generalization per formance of such systems is of crucial importance We will take the prediction risk expected test set error as our measure of generalization performance Learning from Examples Consider a set of n real valued input output data pairs n f i x y i ng drawn from a stationary density The observations can be viewed as being generated according to the signal plus noise model y x i where y is the observed response dependent variable x are the independent variables sampled with input probability density x i is independent identically distributed iid noise sampled with density having mean and variance and x is the conditional mean an unknown function From the signal plus noise perspective the density x y can be represented as the product of two components the conditional density yjx and the input density x x y yjx x y x x The learning problem is then to nd an estimate b x of the conditional mean x on the basis of the training set n In many real world problems few a priori assumptions can be made about the functional form of x Since a parametric function class is usually not known one must resort to a nonparametric regression approach whereby one constructs an estimate b x f x for x from a large class of functions F known to have good approximation properties for example F could be all possible radial basis func tion networks and multilayer perceptrons The class of approximation functions is usually the union of a countable set of subclasses speci c network architectures A F for which the elements of each subclass f w x A are continuously parametrized by a set of p p A weights w fw pg The task of nding the estimate f x thus consists of two problems choosing the best architec ture b A and choosing the best set of weights b w given the architecture Note that in The assumption of additive noise which is independent of x is a standard assumption and is not overly restrictive Many other conceivable signal noise models can be trans formed into this form For example the multiplicative model y x becomes y x for the transformed variable y log y Note that we have made only a minimal assumption about the noise that it is has nite variance independent of x Speci cally we do not need to make the assumption that the noise density is of known form e g gaussian for the following development For example a fully connected two layer perceptron with ve internal units the nonparametric setting there does not typically exist a function f w x F with a nite number of parameters such that f w x x for arbitrary x For this reason the estimators b x f b w x will be biased estimators of x The rst problem nding the architecture A requires a search over possible ar chitectures e g network sizes and topologies usually starting with small archi tectures and then considering larger ones By necessity the search is not usually exhaustive and must use heuristics to reduce search complexity A heuristic search procedure for two layer networks is presented in Moody and Utans The second problem nding a good set of weights for f w x is accomplished by minimizing an objective function b w argminw U w n The objective function U consists of an error function plus a regularizer U w n n Etrain w n S w Here the error Etrain w n measures the distance between the target response values y and the tted values f w x

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Existence of solutions of infinite systems of integral equations in the Frechet spaces

In this paper we apply the technique of measures of noncompactness to the theory of infinite system of integral equations in the Fr´echet spaces. Our aim is to provide a few generalization of Tychonoff fixed point theorem and prove the existence of solutions for infinite systems of nonlinear integral equations with help of the technique of measures of noncompactness and a generalization of Tych...

متن کامل

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...

متن کامل

The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems

We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions. The principal result is the following relationship (computed to second order) between the expected test set and tlaining set errors: (1) Here, n is the size of the training sam...

متن کامل

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006